Method and apparatus for assessing a translation

ABSTRACT

Methods, apparatus and computer program products are provided in order to assess a translation following performance of the translation. The methods, apparatus and computer program products may determine input segments of a source language document that may prove to be problematic from a translatability standpoint, such as the input segments of the source language document that may have multiple output variants. As such, methods, apparatus and computer program products may provide feedback to the author or owner of the source language document that may influence the generation of subsequent source language documents so as to have improved translatability.

TECHNOLOGICAL FIELD

Embodiments of the present disclosure relate generally to methods,apparatus and computer program products for assessing a translation and,more particularly, to methods, apparatus and computer program productsfor assessing a translation following performance of the translation soas to identify, for example, one or more segments of a source languagedocument that may be problematic for translators.

BACKGROUND

Global organizations, among many others, depend on documenttranslations. Translation in industrial sectors such as utilities,manufacturing, and transportation require mastery of various technicaldisciplines, and translation errors or ambiguities can lead to financialand other adverse consequences. Some publication policies prescribe bestpractices for translating technical documentation into the languages ofthe receiving nations. These best practices usually permit authors orother document providers to exert control over the translation in amanner that balances cost with translation quality. However, thispractice offers little control to an organization that producesline-of-business documents in only one language, especially when thatorganization's business model depends on foreign customers to translatereceived documents independently. According to this practice, sourcelanguage documents are translated into target language documents byparties other than the owner of the source language document even thoughthe owner of the source language document retains a proprietary interestin the quality of the translation notwithstanding the limited knowledgeby the owner of the source language document of the target language.

In the absence of control over the translation itself, it could bebeneficial for the owner of a source language document to draft thedocument so as to be more readily translatable. Translatability of adocument denotes those properties of a source language document thatincrease the potential for successful translation of the source languagedocument. Translation quality also depends on translatability andseveral different techniques have been developed for assessingtranslation quality, typically in the context of the prediction oftranslation costs in advance of the actual translation. For example,round-trip translation may be applied casually to machine-translation(MT) systems. In round-trip translation, source language (SL) input istranslated into target language (TL) output by an MT system. This outputis then re-translated from the TL back into the initial SL, and thefinal translation product is then compared to the original input toassess the translation quality of the MT system. Human judgment maydetermine when round-trip translation inputs and outputs aresemantically equivalent or divergent. Although once thought to be anindicator of translation quality, especially when evaluators lack TLknowledge, round-trip translation quality assessment is now consideredless helpful since round-trip translation fails to differentiate thedistinct SL-TL and TL-SL contributions to the final translation product.

Regarding the relationship between translatability and translationquality, the relationship or correlation is suggested by the dependencybetween translatability assessment and post-editing costs. In thisregard, translatability assessment may be used to predict translationcosts. Typically, when translatability scores match translationcapabilities, pre- and post-editing cost estimates are minimal.Otherwise, more time and effort are deemed necessary for an acceptabletranslation product. In either case, translation quality is predicted asa function of SL translatability and translation cost. Understanding ofthis relationship is useful when deciding how to effect a translationand which technologies to apply when human translation is prohibitivelyexpensive or otherwise infeasible.

Some study has been undertaken to understand the formal parameters oftranslatability, that is, those properties of SL input that increase thepotential for successful translation. In this regard, it has beensuggested that authoring or pre-processing SL input with a controlledlanguage (CL) enhances translatability. In this regard, translatabilityassessment typically identifies SL properties that act as impediments totranslation. Usually these properties are aspects of SL non-compliancewith CL specifications. Typically, non-compliance implicates lexical andgrammatical restrictions that neutralize marked features of the SL fromwhich the CL is adapted. In this way, the approach first assesses SLinputs with respect to an idealized, unmarked CL, which figures as aproxy for the actual TL. These studies eventually led to translatabilityassessment independent of the TL involved. Other studies employ machinelearning to assess the translatability of SL inputs and reformulate themas necessary to enhance translatability. In general, the objective ofthese forms of translatability assessment is to predict the time andcost required for translation.

As such, translatability assessment techniques have been generallyutilized prior to translation so as to determine, for example, themanner in which to execute a translation task. As such, thetranslatability assessment techniques described above may facilitate adetermination as to how to effect a translation and which technologiesto apply in an instance in which human translation is prohibitivelyexpensive or otherwise unfeasible. However, translatability assessmenttechniques have not been widely utilized for purposes other than forpre-translation guidance in order to, for example, predict translationcosts.

BRIEF SUMMARY

Methods, apparatus and computer program products are provided inaccordance with embodiments of the present disclosure in order to assessa translation following performance of the translation. The methods,apparatus and computer program products of one embodiment may determineinput segments of a source language document that may prove to beproblematic from a translatability standpoint. As such, methods,apparatus and computer program products of the present disclosure mayprovide feedback to the author or owner of the source language documentthat may influence the generation of subsequent source languagedocuments so as to have improved translatability.

In one embodiment, a method of assessing a translation is provided thatincludes aligning, with a processor, input segments of a source languagedocument with corresponding output segments of a target languagedocument. For each input segment, the method identifies variationsbetween the output segments corresponding to a respective input segment.In this regard, the identification of the variations includes theidentification of a reference translation and one or more outputvariants for the respective input segment. For example, the referencetranslation may be the output segment that most frequently correspondsto the respective input segment. The method of this embodiment alsodetermines the one or more input segments having corresponding outputvariants that fail to satisfy a control limit for translation variation.

The method of one embodiment may also provide feedback regarding the oneor more input segments having corresponding output variants that fail tosatisfy a control limit for translation variation. As such, therecipient of the feedback, such as the author or owner of the sourcelanguage document, can take the feedback into account during theproduction of other source language documents to improve thetranslatability of those other source language documents. In oneembodiment, the determination of the one or more input segments havingcorresponding output variants that fail to satisfy the control limit fortranslation variation may include the determination of a measurement ofsimilarity between each output variant and the reference translation.The measurement of similarity may, in turn, be determined by determininga longest common subsequence between each output variant and thereference translation. Further, the measurement of similarity may bedetermined by determining a similarity metric based upon recall andprecision of the longest common subsequence between each output variantand the reference translation. In one embodiment, the control limit isbased upon the similarity metric.

In one embodiment, a computing device for assessing a translation isprovided that includes a processor configured to align input segments ofa source language document with corresponding output segments of atarget language document. For each input segment, the processor isconfigured to identify variations between the output segmentscorresponding to a respective input segment. In this regard, theidentification of the variations includes the identification of areference translation and one or more output variants for the respectiveinput segment. For example, the reference translation may be the outputsegment that most frequently corresponds to the respective inputsegment. The processor of this embodiment is also configured todetermine the one or more input segments having corresponding outputvariants that fail to satisfy a control limit for translation variation.

The processor of one embodiment may also be configured to providefeedback regarding the one or more input segments having correspondingoutput variants that fail to satisfy a control limit for translationvariation. As such, the recipient of the feedback, such as the author orowner of the source language document, can take the feedback intoaccount during the production of other source language documents toimprove the translatability of those other source language documents. Inone embodiment, the determination of the one or more input segmentshaving corresponding output variants that fail to satisfy the controllimit for translation variation may include the processor'sdetermination of a measurement of similarity between each output variantand the reference translation. The measurement of similarity may, inturn, be determined by the processor determining a longest commonsubsequence between each output variant and the reference translation.Further, the measurement of similarity may be determined by theprocessor's determining a similarity metric based upon recall andprecision of the longest common subsequence between each output variantand the reference translation. In one embodiment, the control limit isbased upon the similarity metric.

In one embodiment, a computer program product for assessing atranslation is provided that includes at least one computer-readablestorage medium having computer-executable program code portions storedtherein. The computer-executable program code portions include programcode instructions for aligning input segments of a source languagedocument with corresponding output segments of a target languagedocument. For each input segment, the computer-executable program codeportions include program code instructions for identifying variationsbetween the output segments corresponding to a respective input segment.In this regard, the identification of the variations includes theidentification of a reference translation and one or more outputvariants for the respective input segment. For example, the referencetranslation may be the output segment that most frequently correspondsto the respective input segment. The computer-executable program codeportions of this embodiment also include program code instructions fordetermining the one or more input segments having corresponding outputvariants that fail to satisfy a control limit for translation variation.

The computer-executable program code portions of one embodiment alsoinclude program code instructions for providing feedback regarding theone or more input segments having corresponding output variants thatfail to satisfy a control limit for translation variation. As such, therecipient of the feedback, such as the author or owner of the sourcelanguage document, can take the feedback into account during theproduction of other source language documents to improve thetranslatability of those other source language documents. In oneembodiment, the program code instructions for determining the one ormore input segments having corresponding output variants that fail tosatisfy the control limit for translation variation may include programcode instructions for determining a measurement of similarity betweeneach output variant and the reference translation. The measurement ofsimilarity may, in turn, be determined by program code instructions fordetermining a longest common subsequence between each output variant andthe reference translation. Further, the measurement of similarity may bedetermined by program code instructions for determining a similaritymetric based upon recall and precision of the longest common subsequencebetween each output variant and the reference translation. In oneembodiment, the control limit is based upon the similarity metric.

In accordance with embodiments of the present disclosure, a method,apparatus and computer program product are provided in order to assess atranslation and to identify input segments of a source language documentthat may be problematic from a translatability standpoint. As such,authors, owners or other providers of source language documents may takeinto account the input segments that have poor translatability in orderto subsequently produce other source language documents that are morereadily translatable. However, the features, functions and advantagesthat have been discussed may be achieved independently and the variousembodiments of the present disclosure may be combined in the otherembodiments, further details of which may be seen with reference to thedetailed description and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Having thus described embodiments of the present disclosure in generalterms, reference will now be made to the accompanying drawings, whichare not necessarily drawn to scale, and wherein:

FIG. 1 is a flow chart illustrating operations performed in accordancewith one embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating operations performed in accordancewith another embodiment of the present disclosure; and

FIG. 3 is a block diagram illustrating a computing device for performingoperations in accordance with one embodiment of the present disclosure.

DETAILED DESCRIPTION

Embodiments of the present disclosure now will be described more fullyhereinafter with reference to the accompanying drawings, in which some,but not all embodiments are shown. Indeed, these embodiments may beembodied in many different forms and should not be construed as limitedto the embodiments set forth herein; rather, these embodiments areprovided so that this disclosure will satisfy applicable legalrequirements. Like numbers refer to like elements throughout.

A method, apparatus and computer program product are provided accordingto one embodiment of the present disclosure for assessing a translationof a source language document following the generation or production ofthe translation. Based upon the assessment of the translation, feedbackmay be provided to the author or owner of the source language documentto indicate input segments of the source language document that areproblematic from a translatability standpoint, such as those inputsegments that lend themselves to a plurality of different translations.Based upon this feedback, the source language document may be revised orother source language documents may be subsequently created that takeinto account the results of the translatability assessment so as tocreate source language documents that are more consistently andaccurately translated.

While the methods, apparatus and computer program products ofembodiments of the present disclosure may be utilized in a variety ofsituations, the methods, apparatus and computer program products of oneembodiment are useful in an instance in which the author or owner of thesource language document does not perform or otherwise have control overthe translation of the source language document. For example, the authoror owner of the source language document may create and provide amonolingual document to another party, such as a customer, a partner orthe like. The other party may then translate the source languagedocument independent of any input or control by the author or owner ofthe source language document. As a result of its authorship or ownershipof the source language document, however, the author or owner of thesource language document still has an interest in the quality of thetranslation to ensure that the content of the source language documentis accurately and consistently reproduced in the target language. Byacting upon feedback provided in accordance with embodiments of thepresent disclosure, the author or owner of a source language documentmay work to improve the translatability of subsequent source languagedocuments, thereby reducing the risks associated with poor translationsof the source language documents.

The methods, apparatus and computer program products of embodiments ofthe present disclosure generally identify input elements of the sourcelanguage document that have poor translatability based upon the analysisof textual properties of a parallel pair of source language and targetlanguage documents. As shown in operation 10 of FIG. 1, a method ofassessing a translation may initially align input segments of a sourcelanguage document with corresponding output segments of a targetlanguage document. In this regard, the target language document is atranslation of the source language document. The input and outputsegments that are aligned may be of various lengths. For example, theinput and output segments may be sentences, phrases or othercombinations of words and associated characters.

In the alignment process, an input segment of the source languagedocument is aligned or matched with an output segment of the targetlanguage document that represents the same sentence, phrase or the likeas does the input segment. Various alignment techniques may be utilized,such as that described at http://champollion.sourceforge.net. Forexample, an alignment technique may accept a parallel document pair,such as a source language document and a corresponding target languagedocument, as an input and produce a bisegmentation relation thatidentifies mutual translation correspondences between segments of eachdocument, such as between an input segment of the source languagedocument and a corresponding output segment of the target languagedocument. As noted, the granularity of the bisegmentation relations mayvary from words, collocations, phrases, sentences, or other textualunits. In one embodiment, for example, an alignment technique mayutilize a length-based probabilistic algorithm supplemented with adomain-specific source language-target language lexical resource toproduce sentence alignments. See, for example, Peng Li, et al.,“Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm”,Proceedings of the 23^(rd) International Conference on ComputationalLinguistics (COLING 2010).

For each input segment, the method may identify variations between theoutput segments that correspond to the respective input segment as shownin block 12 of FIG. 1. By way of illustration and without limitation orintent for aircraft or functional use, several input segments of anEnglish language source document (designated “SL input”) are reproducedbelow in Table 1 along with the corresponding output segments of aMandarin language target document (designated “TL output”) and thefrequency (Freq) of occurrence of each output segment.

TABLE 1 SL input TL output Freq Pitch attitude to

5 remain outside the red RA regions

1 Present ADI pitch

9 attitude is within the red RA regions Traffic aircraft is

 

8 either climbing or descending in excess

 

3 Traffic aircraft

 

6 is providing altitude information

 

4

 

3

One of the input segments of the source language document, that is,“Present ADI pitch attitude is within the red RA regions” has only asingle corresponding output segment and therefore has no translationvariations and, as a result, superior translatability. However, theother three input segments of the English language source document havetwo or more corresponding output segments in the Mandarin languagetarget document. As such, these input segments that have multiplecorresponding output segments have poorer translatability. Generally,however, some variation in the output segments of a target languagedocument may be tolerable, while more substantial translation variationsmay be considered intolerable and indicative of poor translatability ofthe corresponding input segments of the source language document.

The relationship between input segments of a source language documentand the corresponding output segments of a target language document thatis reflected in Table 1 need not be presented to a user, but theunderlying information regarding the corresponding output segments ofthe target language document and the frequency with which each of thecorresponding output segments appears within the target languagedocument may be utilized when assessing the translatability of thesource language document. In order to assess the translation variations,the output segments of a target language document are reviewed toidentify instances in which different output segments correlate to thesame input segment. In this regard, those input segments of the sourcelanguage document that have a single corresponding output segment areidentified by the method to have no output variants. However, for eachinput segment of the source language document that has two or morecorresponding output segments in the target language document, themethod identifies a reference translation and one or more outputvariants. See operation 12 of FIG. 1. In this regard, the referencetranslation is generally the output segment corresponding to arespective input segment that occurs most frequently, while the otheroutput segments corresponding to the same respective input segment areconsidered output variants. With respect to the example of Table 1, the“Pitch attitude to remain outside the red RA regions” input segment hasa corresponding output segment (

) that occurs most frequently, i.e., five times, and is identified asthe reference translation, while the other corresponding output segment(

) occurs less frequently, i.e., one time, and is identified as an outputvariant. As another example, the “Traffic aircraft is providing altitudeinformation” input segment has a corresponding output segment (

) that occurs most frequently, i.e., six times, and is identified as thereference translation, while the two other corresponding output segmentsoccur less frequently, i.e., four and three times, and are identified asoutput variants.

Thereafter, the method may determine the one or more input segments ofthe source language document that have corresponding output variantsthat fail to satisfy a control limit for translation variation, as shownin block 14 of FIG. 1. By judicious selection of the control limit, theamount of translation variation that is tolerable may be adjusteddepending upon the circumstances surrounding the translation of thesource language document to the target language document. Thedetermination of the input segment(s) that have corresponding outputvariants that fail to satisfy a control limit for translation variationmay be accomplished in various manners. In one embodiment, however, themethod may determine the input segment(s) having corresponding outputvariants that fail to satisfy the control limit for translationvariation by determining a measurement of similarity between each outputvariant and the reference translation. In this regard, the determinationof the measurement of similarity may include a determination of thelongest common subsequence between each output variant and the referencetranslation.

In this regard, each output segment that corresponds to a respectiveinput segment may be construed as a string of words and the similaritybetween the output segments varies directly based upon the length of thesubsequence commonality between the strings of words. In thisembodiment, output segments that have longer subsequence commonalitywill be considered more similar than output variants that have shortersubsequence commonality. For example, a common subsequence of referencetranslation X is any output variant Y that exhibits the word sequence ofX with zero or more elements omitted. Expressed in terms of abstractsequences X, Y and Z, Z is regarded as a common subsequence of X and Yif Z is a subsequence of X and Y. For example, if X equals {A, B, C, B,D, A} and Y equals {B, D, C, A, B}, the sequence {B, C, A} is the commonsubsequence of X and Y. See, for example, Thomas H. Cormen, et al.,“Introduction to Algorithms,” Third Edition, MIT Press (2009). By way ofexample and without limitation or intent for aircraft or functional use,Table 2 represents the output segments (TL output) of a Mandarinlanguage target document that correspond to an input segment of “Trafficaircraft is providing altitude information” from an English languagesource document.

TABLE 2 TL output Tokenized TL output (1)

 

 

(2)

 

 

(3)

 

 

 

As shown, the output segments may be tokenized in order to break theoutput segments into a plurality of words or other lexical units. By wayof example, the first output segment may serve as the referencetranslation with the second and third output segments being outputvariants of the reference translation. While the second and third outputsegments share a common subsequence with the first output segment forthe words in sequential positions 0 and 1, the method may determine thelongest common subsequence (LCS) for each output variant relative to thereference translation. In this regard, the longest common subsequencefor the second output variant relative to the reference translation isthe words in sequential positions 0, 1, 3 and 4. Similarly, the longestcommon subsequence for the third output variant relative to thereference translation involves the words in sequential positions 0, 1, 4and 5. In general, for any two output segments X and Y with X being thereference translation, the longest common subsequence of X and Y denotedLCS (X, Y) is the maximum count of words that Y shares in common with Xand which occur in Y in the same sequential order, but not necessarilyconsecutively, as they appear in X.

In one embodiment, the determination of the measurement of similaritymay include the determination of a similarity metric based upon therecall and precision, such as the weighted harmonic mean of the recalland precision, of the longest common subsequence (LCS) between eachoutput variant and the reference translation. In this embodiment, thecontrol limit may, in turn, be based upon the similarity metric. Asdescribed by Chin-Yew Lin, et al., “Automatic Evaluation of MachineTranslation Quality Using Longest Common Subsequence and Skip-BigramStatistics”, Proceedings of the 42^(nd) Annual Meeting of theAssociation for Computational Linguistics (ACL 2004), for a referencetranslation X of length m and an output variant Y of length n, theweighted harmonic mean of the recall R_(lcs) for the LCS may be definedas:

${R_{lcs}\left( {X,Y} \right)} = {\frac{{LCS}\left( {X,Y} \right)}{m}.}$

Additionally, the weighted harmonic mean of the precision P_(lcs) forthe LCS may be defined as:

${P_{lcs}\left( {X,Y} \right)} = {\frac{{LCS}\left( {X,Y} \right)}{n}.}$

Additionally, a weighting value β may be defined as:

$\beta = {\frac{P_{lcs}\left( {X,Y} \right)}{R_{lcs}\left( {X,Y} \right)}.}$

Although a similarity metric may be determined based upon the recall andprecision of the longest common subsequence in various manners, themethod of one embodiment may determine a similarity metric F_(lcs) (X,Y) as follows:

${F_{lcs}\left( {X,Y} \right)} = {\frac{\left( {1 + \beta^{2}} \right){R_{lcs}\left( {X,Y} \right)}{P_{lcs}\left( {X,Y} \right)}}{\left( {{R_{lcs}\left( {X,Y} \right)} + \beta^{2}} \right){P_{lcs}\left( {X,Y} \right)}}.}$

By way of example and with reference to the reference translation, i.e.,TL output (1), and the output variants, i.e., TL outputs (2) and (3), ofTable 2, the similarity metric F_(lcs) (X, Y) is 0.8 for TL output (2)relative to the reference translation and 0.73 for TL output (3)relative to the reference translation in an instance in which theweighting value β equals one. Thus, the similarity metric of thisembodiment takes into consideration word count variations between theoutput variants and the reference translation and confirms humanintuition that, from among the output variants with the same LCS, theoutput variant having the same number of words as the referencetranslation has less variance from the reference translation than doesan output variant that has a different number of words than thereference translation. As such, the longest common in-sequence n-graminformation factored into the foregoing equation for the similaritymetric F_(lcs) (X, Y) provides a target language output comparisonmetric having sensitivity for the empirical facts of linear precedence.

As noted above, the method may then utilize the similarity metric inorder to define the control limit that establishes whether a translationvariation is tolerable or intolerable. In one embodiment, the similaritymeasures for the plurality of output segments are presumed to be anormally-distributed random variable that are aggregated so as todetermine a control limit for translation variation between a sourcelanguage document and the target language document. Thus, outputsegments of the target language document that satisfy the control limitmay be considered to be tolerable or acceptable even if those outputsegments vary somewhat from the reference translation, while outputsegments that fail to satisfy the control limit may be consideredintolerable as a result of their excessive variation relative to thereference translation.

While the control limit may be based upon the similarity metric in avariety of different manners, one example of the relationship betweenthe similarity metric and the control limit is provided herein forpurposes of example, but not of limitation. In this example, v_(i) is anoutput variant that occurs in a parallel document pair, that is, a pairconsisting of a source language document and a corresponding targetlanguage document, with a total of m output variants, excluding thoseoutput segments that serve as reference translations. Additionally,x_(i) is the LCS-based similarity measure obtained from F_(lcs)(v_(i),r_(i)) in an instance in which r_(i) is the reference translation forv_(i). In this example, the method may determine the arithmetic mean ofthe sum of all the differences between the similarity estimates for eachx_(i) and its predecessor x_(i−1) according to the following equation:

${MR} = {\frac{\sum\limits_{i = 2}^{m}{{x_{i} - x_{i - 1}}}}{m - 1}.}$

In this regard, the foregoing equation determines the moving range (MR)of translation variation across the parallel document pair. This movingrange value quantifies the average translation variation. The controllimit for translation variation may, in turn, be based upon the movingrange MR and, in one embodiment, the control limit may be determined asthe product of the moving range MR and the multiplier 2.66. In thisregard, the multiplier 2.66 may be obtained by dividing 3 by theanti-biasing constant for n=2 as described, for example, in DouglasMontgomery, “Introduction to Statistical Quality Control”, John Wiley &Sons (2005).

Once a control limit has been established for translation variation,such as 2.66 MR, the method of one embodiment may compare the similaritymeasure x_(i) for each output variant v_(i) with the control limit inorder to determine the output variants, if any, that exceed the controllimit and which will, therefore, be considered to exceed the tolerablelevels of translation variation established by the control limit. In aninstance in which one or more input segments of a source languagedocument have output segment(s) that exhibit an intolerable translationvariation, the method may provide feedback to the author or owner of thesource language document as shown in operation 16 of FIG. 1 such thatthe author or owner of the source language document may consider theinput segment(s) that give rise to the intolerable translation variationand consider ways in which the input segment(s) could be rephrased orrestructured in order to improve its translatability, either in anotherversion of the same source language document or in other source languagedocuments in the future. Based upon the feedback provided in accordancewith the method of one example embodiment, translation irregularitiesmay be anticipated such that source language documents may besubsequently optimized for translatability. As such, the method mayprovide for increased cross-cultural equivalence between source languagedocuments and target language documents.

By way of a further example, FIG. 2 illustrates another representationof a method in which source language documents are produced, such assource language documents that include technical data. See operations 20and 22 of FIG. 2. The source language documents of this embodiment maybe provided to a recipient, such as another party different than theparty that produced the source language document. The recipient maytranslate the source language documents, including the underlyingtechnical data, into a plurality of corresponding target languagedocuments. See operations 24 and 26 of FIG. 2. In accordance with anembodiment of the present disclosure, the target language documents maybe provided to the original producer of the source language document andaligned with the corresponding source language documents. See operation28 of FIG. 2. In this regard, input segments of a source languagedocument may, in turn, be aligned with corresponding output segments ofthe target language document. For each input segment, variations betweenthe output segments corresponding to the respective input segment may beidentified and the frequency with which those output variations appearmay be determined. See operation 30 of FIG. 2. Based upon theidentification of the variations between the output segmentscorresponding to a respective input segment, a reference translation andone or more output variants may be determined for each input segmentthat has multiple corresponding output segments.

As described above, a control limit for translation variation may thenbe determined and the output variants may be compared to the controllimit to determine if the output variants vary excessively. Seeoperations 32 and 34 of FIG. 2. In instances in which an input segmentof a source language document is determined to have one or more outputvariants that have an excessive variation, such as by failing to satisfythe control limit, the method may provide feedback such that theproducer of the source language document, such as the author, the owneror the like of the source language document, may consider those inputsegments that have poor translatability and may consider revisions tothe input segments of the source language document or similar inputsegments of other source language documents in an effort to improve thetranslatability of those input segments and the correspondingtranslatability of the source language document. As shown in operation36 of FIG. 2, the potential revisions to an input segment of a sourcelanguage document may include a revision or optimization of thetechnical data embodied within the source language document.

The methods described above and illustrated, for example, in FIGS. 1 and2 may be implemented in an automated fashion, that is, without manualintervention, by a computing device, such as shown in FIG. 3. In thisregard, the computing device of one embodiment of the present disclosuremay include specifically configured processing circuitry such as aspecifically configured processor 40, and an associated memory device42, both of which are commonly comprised by a computer or the like. Inthis regard, the method of embodiments of the present invention as setforth generally in FIGS. 1 and 2 can be performed by the processorexecuting a computer program instructions stored by the memory device.The computing device can also include a user interface 44 including, forexample, a display for presenting information and/or for receivinginformation relative to performing embodiments of the method of thepresent invention.

As noted above, the processor 40 may operate under control of a computerprogram product. In this regard, the computer program product forperforming the methods of embodiments of the present disclosure includesa computer-readable storage medium, such as a non-volatile,non-transitory storage medium, and computer-readable program codeportions, such as a series of computer instructions, embodied in thecomputer-readable storage medium.

In this regard, FIGS. 1 and 2 are flowcharts of methods, systems andprogram products according to embodiments of the present disclosure. Itwill be understood that each block or step of the flowchart, andcombinations of blocks in the flowchart, can be implemented by computerprogram instructions. These computer program instructions may be loadedonto a computing device, such as shown in FIG. 3, or other programmableapparatus to produce a machine, such that the instructions which executeon the computing device or other programmable apparatus create means forimplementing the functions specified in the flowchart block(s) orstep(s). These computer program instructions may also be stored in acomputer-readable memory, e.g., memory device 42, that can direct acomputing device or other programmable apparatus to function in aparticular manner, such that the instructions stored in thecomputer-readable memory produce an article of manufacture includinginstructions which implement the function specified in the flowchartblock(s) or step(s). The computer program instructions may also beloaded onto a computing device or other programmable apparatus to causea series of operational steps to be performed on the computing device orother programmable apparatus to produce a computer implemented processsuch that the instructions which execute on the computer or otherprogrammable apparatus provide steps for implementing the functionsspecified in the flowchart block(s) or step(s).

Accordingly, blocks or steps of the flowchart support combinations ofmeans for performing the specified functions and combinations of stepsfor performing the specified functions. It will also be understood thateach block or step of the flowchart, and combinations of blocks or stepsin the flowchart, can be implemented by special purpose hardware-basedcomputer systems which perform the specified functions or steps, orcombinations of special purpose hardware and computer instructions.

Many modifications and other embodiments of the present disclosure setforth herein will come to mind to one skilled in the art to which theseembodiments pertain having the benefit of the teachings presented in theforegoing descriptions and the associated drawings. Therefore, it is tobe understood that the present disclosure is not to be limited to thespecific embodiments disclosed and that modifications and otherembodiments are intended to be included within the scope of the appendedclaims. Although specific terms are employed herein, they are used in ageneric and descriptive sense only and not for purposes of limitation.

1. A method of assessing a translation comprising: aligning, with a processor, input segments of a source language document with corresponding output segments of a target language document; for each input segment, identifying variations between the output segments corresponding to a respective input segment, wherein identifying the variations comprises identifying a reference translation and one or more output variants for the respective input segment; and determining the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 2. A method according to claim 1 further comprising providing feedback regarding the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 3. A method according to claim 1 wherein identifying the reference translation comprises identifying the output segment that most frequently corresponds to the respective input segment.
 4. A method according to claim 1 wherein determining the one or more input segments having corresponding output variants that fail to satisfy the control limit for translation variation comprises determining a measurement of similarity between each output variant and the reference translation.
 5. A method according to claim 4 wherein determining the measurement of similarity comprises determining a longest common subsequence between each output variant and the reference translation.
 6. A method according to claim 5 wherein determining the measurement of similarity comprises determining a similarity metric based upon recall and precision of the longest common subsequence between each output variant and the reference translation.
 7. A method according to claim 6 wherein the control limit is based upon the similarity metric.
 8. A computing device configured to assess a translation, wherein the computing device comprises a processor configured to align input segments of a source language document with corresponding output segments of a target language document, wherein the processor is also configured, for each input segment, to identify variations between the output segments corresponding to a respective input segment including identification of a reference translation and one or more output variants for the respective input segment, and wherein the processor is configured to determine the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 9. A computing device according to claim 8 wherein the processor is further configured to provide feedback regarding the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 10. A computing device according to claim 8 wherein the processor is configured to identify the reference translation by identifying the output segment that most frequently corresponds to the respective input segment.
 11. A computing device according to claim 8 wherein the processor is configured to determine the one or more input segments having corresponding output variants that fail to satisfy the control limit for translation variation by determining a measurement of similarity between each output variant and the reference translation.
 12. A computing device according to claim 11 wherein the processor is configured to determine the measurement of similarity by determining a longest common subsequence between each output variant and the reference translation.
 13. A computing device according to claim 12 wherein the processor is configured to determine the measurement of similarity by determining a similarity metric based upon recall and precision of the longest common subsequence between each output variant and the reference translation.
 14. A computing device according to claim 13 wherein the control limit is based upon the similarity metric.
 15. A computer program product for assessing a translation and comprising at least one computer-readable storage medium having computer-executable program code portions stored therein, the computer-executable program code portions comprising: program code instructions for aligning input segments of a source language document with corresponding output segments of a target language document; for each input segment, program code instructions for identifying variations between the output segments corresponding to a respective input segment, wherein identifying the variations comprises identifying a reference translation and one or more output variants for the respective input segment; and program code instructions for determining the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 16. A computer program product according to claim 15 further comprising program code instructions for roviding feedback regarding the one or more input segments having corresponding output variants that fail to satisfy a control limit for translation variation.
 17. A computer program product according to claim 15 wherein the program code instructions for identifying the reference translation comprise program code instructions for identifying the output segment that most frequently corresponds to the respective input segment.
 18. A computer program product according to claim 15 wherein the program code instructions for determining the one or more input segments having corresponding output variants that fail to satisfy the control limit for translation variation comprise program code instructions for determining a measurement of similarity between each output variant and the reference translation.
 19. A computer program product according to claim 18 wherein the program code instructions for determining the measurement of similarity comprise program code instructions for determining a longest common subsequence between each output variant and the reference translation.
 20. A computer program product according to claim 5 wherein the program code instructions for determining the measurement of similarity comprise program code instructions for determining a similarity metric based upon recall and precision of the longest common subsequence between each output variant and the reference translation, wherein the control limit is based upon the similarity metric. 